Case Study 4 - Time Series


Time Series Analysis of Seasonal Flu


Martin Garcia, Michael Catalano, Jeremy Otsap, Christian Nava
July 1, 2020

Introduction


The seasonal flu

We are going to use ARIMA to model the seasonal flu.

  1. Extract historical flu data (positive cases over time) – you can choose to model flu patterns at the national or regional/state-level
  2. Build an ARIMA model; determine the appropriate values for (p,d,q)
  3. How well does your model perform on validation data? (Note: you’ll need to create a training and validation set to measure forecast accuracy)
  4. Provide analysis to support your determinations

Weekly flu data are taken from the World Health Organization’s (WHO) FluNet database from October 04, 2010 through June 07, 2020. The data was cleaned and only 4 of the 22 variables were kept: Week, SDATE, SPEC_PROCESSED_NB, and ALL_INF. The latter three were renamed to wk_date.start, total_specimens.tested, and total_flu_cases.positive, respectively. An additional variable, total_flu_cases.percent_positive, was created, which is a percentage of the total specimens processed that tested positive for any strain of influenza. We will use this variable as our target time series variable.

$autplt
 [1]  1.000000000  0.976805102  0.924536826  0.853018652  0.767655386
 [6]  0.670334051  0.562097838  0.446930491  0.328412250  0.210504886
[11]  0.097492674 -0.008848353 -0.106134687 -0.191755787 -0.265062059
[16] -0.327303640 -0.378929113 -0.420902512 -0.452465921 -0.474641016
[21] -0.489073342 -0.497986988 -0.503158896 -0.504427290 -0.501622425
[26] -0.498358721

$freq
  [1] 0.003846154 0.007692308 0.011538462 0.015384615 0.019230769 0.023076923
  [7] 0.026923077 0.030769231 0.034615385 0.038461538 0.042307692 0.046153846
 [13] 0.050000000 0.053846154 0.057692308 0.061538462 0.065384615 0.069230769
 [19] 0.073076923 0.076923077 0.080769231 0.084615385 0.088461538 0.092307692
 [25] 0.096153846 0.100000000 0.103846154 0.107692308 0.111538462 0.115384615
 [31] 0.119230769 0.123076923 0.126923077 0.130769231 0.134615385 0.138461538
 [37] 0.142307692 0.146153846 0.150000000 0.153846154 0.157692308 0.161538462
 [43] 0.165384615 0.169230769 0.173076923 0.176923077 0.180769231 0.184615385
 [49] 0.188461538 0.192307692 0.196153846 0.200000000 0.203846154 0.207692308
 [55] 0.211538462 0.215384615 0.219230769 0.223076923 0.226923077 0.230769231
 [61] 0.234615385 0.238461538 0.242307692 0.246153846 0.250000000 0.253846154
 [67] 0.257692308 0.261538462 0.265384615 0.269230769 0.273076923 0.276923077
 [73] 0.280769231 0.284615385 0.288461538 0.292307692 0.296153846 0.300000000
 [79] 0.303846154 0.307692308 0.311538462 0.315384615 0.319230769 0.323076923
 [85] 0.326923077 0.330769231 0.334615385 0.338461538 0.342307692 0.346153846
 [91] 0.350000000 0.353846154 0.357692308 0.361538462 0.365384615 0.369230769
 [97] 0.373076923 0.376923077 0.380769231 0.384615385 0.388461538 0.392307692
[103] 0.396153846 0.400000000 0.403846154 0.407692308 0.411538462 0.415384615
[109] 0.419230769 0.423076923 0.426923077 0.430769231 0.434615385 0.438461538
[115] 0.442307692 0.446153846 0.450000000 0.453846154 0.457692308 0.461538462
[121] 0.465384615 0.469230769 0.473076923 0.476923077 0.480769231 0.484615385
[127] 0.488461538 0.492307692 0.496153846 0.500000000

$db
  [1]   2.853836   3.371205   1.512745  -3.416015  19.406263   3.955619
  [7]   6.264924   4.168738   1.003093  12.351847  -6.302650   4.925540
 [13]  -2.243765  -2.337566  -1.314618  -5.756434  -2.911888 -12.738661
 [19]  -5.425095  -7.707159  -3.076971 -12.364084 -15.015323  -9.374249
 [25]  -8.927485  -8.828385 -22.509648 -15.285883 -11.355253  -8.429517
 [31] -16.035502 -16.029324 -12.703345 -11.390240  -8.461791 -10.434273
 [37] -26.719742 -15.374167 -18.944671 -10.352591 -10.058433 -19.926264
 [43] -19.652971 -25.808611 -12.083546 -10.620464 -28.086260 -28.758134
 [49] -15.075307 -11.762048 -16.007209 -24.554443 -24.490342 -12.955775
 [55] -18.559671 -26.147727 -27.438279 -22.311166 -20.626683 -16.591153
 [61] -30.394407 -29.515385 -19.653743 -26.984700 -14.688882 -22.302582
 [67] -25.104405 -27.881295 -23.379006 -19.065600 -24.021090 -21.477656
 [73] -28.361080 -27.474549 -18.335920 -24.532227 -32.461248 -31.589200
 [79] -30.190570 -22.779024 -28.008706 -23.449389 -33.046573 -25.891459
 [85] -30.773434 -26.712263 -24.919340 -35.339262 -25.312662 -30.714813
 [91] -35.838483 -27.000893 -26.529210 -27.730788 -24.268416 -41.431646
 [97] -23.163481 -26.623638 -28.840145 -19.560907 -31.244290 -35.633536
[103] -26.591732 -33.486899 -22.012749 -28.718999 -26.980171 -31.778016
[109] -23.171507 -31.410986 -31.471537 -30.545238 -32.304405 -20.590284
[115] -31.923601 -28.602621 -33.050717 -32.016390 -23.014361 -27.260538
[121] -29.887522 -41.389545 -35.334682 -27.951645 -29.230879 -27.120772
[127] -26.224603 -38.840550 -26.487843 -32.438321

$dbz
  [1]   9.9434515  10.0390873  10.1590193  10.2587707  10.2955923  10.2346669
  [7]  10.0506316   9.7265870   9.2524265   8.6234421   7.8395454   6.9051289
 [13]   5.8294341   4.6271537   3.3188019   1.9302183   0.4907195  -0.9697054
 [19]  -2.4221145  -3.8361120  -5.1718807  -6.3719957  -7.3665157  -8.1024670
 [25]  -8.5821543  -8.8702976  -9.0590782  -9.2262617  -9.4156196  -9.6387720
 [31]  -9.8871440 -10.1461994 -10.4070889 -10.6718834 -10.9510846 -11.2560475
 [37] -11.5910644 -11.9489671 -12.3120308 -12.6580333 -12.9691050 -13.2388103
 [43] -13.4731626 -13.6852307 -13.8874313 -14.0864596 -14.2832650 -14.4772358
 [49] -14.6715698 -14.8763558 -15.1074307 -15.3816528 -15.7109127 -16.0971381
 [55] -16.5296771 -16.9859320 -17.4359069 -17.8503412 -18.2097356 -18.5096296
 [61] -18.7590391 -18.9736537 -19.1686016 -19.3545124 -19.5373696 -19.7203435
 [67] -19.9054287 -20.0937435 -20.2848111 -20.4761004 -20.6640321 -20.8466030
 [73] -21.0263837 -21.2119042 -21.4160771 -21.6519489 -21.9274333 -22.2410153
 [79] -22.5800796 -22.9230590 -23.2456747 -23.5294218 -23.7679217 -23.9667776
 [85] -24.1365835 -24.2834523 -24.4027031 -24.4792643 -24.4949989 -24.4393940
 [91] -24.3172889 -24.1487619 -23.9620183 -23.7846098 -23.6376216 -23.5339740
 [97] -23.4795086 -23.4749974 -23.5177365 -23.6021592 -23.7196336 -23.8581640
[103] -24.0029708 -24.1386616 -24.2528342 -24.3397436 -24.4020351 -24.4493389
[109] -24.4943521 -24.5484090 -24.6184940 -24.7065985 -24.8111606 -24.9295055
[115] -25.0598493 -25.2016687 -25.3540736 -25.5129240 -25.6682683 -25.8039182
[121] -25.9004379 -25.9412722 -25.9194873 -25.8413273 -25.7243973 -25.5918512
[127] -25.4661240 -25.3649210 -25.2999912 -25.2776917

(Need to talk about stationarity here and the conditions required for stationarity)

From a visual inspection it is difficult to determine if the mean is constant over time. If the mean were constant over time then every time period would have the same mean, i.e., the mean for December is the same for every . Below is a plot of mean positive rates for every week of the year. The plot clearly shows there are weekly differences.

The correlation between data points (covariance) only depends on how far apart they are in time and not where they are in time (i.e., contstant autocovariance). The ACF plots below split the data in half to see if the autocorrelations (autocovariance) change over time. Autocorrelations that change over time would imply a non-stationary time series. Comparing the first half of the data to the second half of the data shows the ACFs are nearly identical. This suggests the autocovariance of the data is constant over time.

There appears to be cyclic behavior in the data, which could imply seasonality and a non-stationary process. However, time series data with cyclic behavior and no trend or seasonality is considered stationary if the cycles are not of a fixed length. Per the realization plot above and Table 1 below, there does not appear to be a trend, and it is evident that the cycles (measuring from peak to peak) are not of a fixed length. The number of weeks that elapse between peaks for the time series varies between 41 and 63 weeks. Intuitively, this makes sense as the peak of the flu “season” doesn’t necessarily fall on the same week or month every year. Therefore, from a visual inspection of the plots, the data appear to come from a stationary process.

Table 1: Flu Season Peak Week
Flu Season Peak Week Start Date Peak Week Number # of Weeks Between Peaks Positive Rate
2010-2011 January 31, 2011 5 N/A 35.49%
2011-2012 March 12, 2012 11 58 31.90%
2012-2013 December 24, 2012 52 41 38.18%
2013-2014 December 23, 2013 52 52 30.61%
2014-2015 December 22, 2014 52 52 32.37%
2015-2016 March 07, 2016 10 63 28.59%
2016-2017 February 20, 2017 8 50 28.17%
2017-2018 January 08, 2018 2 46 30.50%
2018-2019 February 25, 2019 9 59 29.58%
2019-2020 February 03, 2020 6 49 32.74%
Cochrane-Orcutt Test to Check for Trend

To check if there is a trend, the Cochrane-Orcutt test is employed, which tests the hypothesis that there is no serial correlation in the residuals (i.e., no trend). The Cochrane-Orcutt test yields a p-value of 0.533 and fails to reject the null hypothesis, which suggests there is no trend.

Call:
lm(formula = x ~ t, data = df)

             Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.339934  17.627719   1.097   0.2736
t           -0.055471   0.088742  -0.625   0.5325

Residual standard error: 1.9762 on 257 degrees of freedom
Multiple R-squared:  0.0015 ,  Adjusted R-squared:  -0.0024
F-statistic: 0.4 on 1 and 257 DF,  p-value: < 5.325e-01

Durbin-Watson statistic 
(original):    0.04223 , p-value: 2.763e-57
(transformed): 0.58484 , p-value: 6.989e-31
Dickey-Fuller Test for Stationarity

Employing an augmented Dickey-Fuller test is a more formal approach to check for stationarity. An augmented Dickey-Fuller test helps determine if one or more seasonal factors should be included in the model and tests the null hypothesis that the autoregressive model has a root outside of the unit circle. The test depends on failing to reject the null hypothesis to decide whether there is a unit root present. However, failing to reject the null hypothesis is not evidence that a unit root (i.e., seasonal factor) exists.

A p-value > 0.5 for the augmented Dickey-Fuller test fails to reject the null hypothesis, which means that a unit root, or one or more seasonal factors, may be present. In the case of this data, the augmented Dickey-Fuller test rejects the null hypothesis with a p-value of 0.01, suggesting there are no seasonal factors present and validating the initial visual inspection of the data.

Per the initial visual inspection, the Cochrane-Orcutt test, and the Dickey-Fuller test, the data for the first model will be assumed to be stationary with no trend and no seasonal factors.


    Augmented Dickey-Fuller Test

data:  ts_most_recent_5_years.flu_percent_positive_cases
Dickey-Fuller = -5.081, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary
Model Selection Methodology

The Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) are used in this case study to select candidate models. The AIC and the BIC are measures that score a model based on its log-likelihood and complexity. The AIC aims to reduce the white noise variance in the model and penalizes models that add additional terms. The BIC is concurrently employed as the AIC tends to select higher order models, i.e., it may propose selecting an ARMA (2,2) over ARMA (1,1) model. The BIC imposes stronger penalties for increasing the orders of \(p\) and \(q\) and will tend to select models with fewer parameters. Lower values for AIC and BIC are preferred as they indicate

Candidate Model 1 - ARMA(4,1)


For this candidate model, we will assume that the realization is stationary. We use aic5.wge to identify the models with the lowest AIC and BIC. The AIC identifies a potential ARMA(3,2) model.

---------WORKING... PLEASE WAIT... 


Five Smallest Values of  aic 
p q aic
14 4 1 0.5787330
12 3 2 0.5818852
18 5 2 0.5984171
9 2 2 0.6088504
16 5 0 0.6135119

The BIC identifies a potential ARMA(4,1) model.

---------WORKING... PLEASE WAIT... 


Five Smallest Values of  bic 
p q bic
14 4 1 0.6609025
12 3 2 0.6640548
7 2 0 0.6755905
9 2 2 0.6773250
16 5 0 0.6956814

Both the AIC and BIS methods select an ARMA (4,1) model as the best structure for the data.

Estmimating the parameters of the model using \(p=4\) and \(q=1\) the model takes on the form \((1-2.55B+2.26^2-0.80B^3+0.11^4)X_t = (1-0.95B)a_t, \;\;\;\sigma_a^2 = 2.14\;\).


Coefficients of Original polynomial:  
2.5568 -2.2625 0.8026 -0.1052 

Factor                 Roots                Abs Recip    System Freq 
1-1.9422B+0.9593B^2    1.0123+-0.1327i      0.9794       0.0208
1-0.6145B+0.1097B^2    2.8007+-1.1274i      0.3312       0.0609
  
  

A plot of the forecast for the last year of data, 52 weeks, suggests the model doesn’t do a very good job of predicitng the next year’s worth of data.

Performance Metric

We will use the average squared error (ASE) to measure the goodness of fit of the model (performance). The ASE measure takes the sum of the square of the difference between the forecasted, or predicted, value, \(\hat X_i\), and the actual value, \(X_i\), and takes the average for \(n\) number of observations. A lower ASE value indicates the model made fewer forecast errors.

\[ASE = \frac{\sum(\hat X_i - X_i)^2}{n}\]

The training dataset will be comprised of at least 130 weeks, or 2.5 years of data. The forecast horizon will be used as the validation set.

More explanation goes in here about how a rolling window ASE is better than a single ASE.

Ideally, for the rolling window ASE charts below, a low and steady ASE value (red line) as compared to the observed values (blue line) is preferred. This would indicate that the model did a good job of predicting most, if not all, the observed values. Spikes in the ASE value represent observed values that were not predicted well, areas of large error.

[1] 5.446276
[1] 0.7690026

[1] 12.69977
[1] 2.908173

[1] 37.91318
[1] 21.63031

[1] 53.46084
[1] 45.45367

From the rolling window ASE charts above and Table 2 below the performance of the model deteriorates as the horizon increases.

section here about why median rolling window ASE and not single ASE, or mean rolling window ASE. Maybe use histogram of rolling window ASE values to make your point.

Table 2 below shows that the model does better at forecasting shorter horizons.

Table 2: Rolling Window ASE
Forecast Horizon Mean Rolling Window ASE Median Rolling Window ASE
2 weeks 5.446 0.769
4 weeks 12.700 2.908
3 months 37.913 21.630
6 months 53.461 45.454

Candidate Model 2 - ARIMA(3,2,1) s=52


The first candidate model didn’t do a great job in terms of forecasting longer horizons. Taking a closer look at ACF and spectral densities

There is no clear upward or downward trend in the data; however, seasonal trend is present as evidenced by the cyclic behavior of the realizations. Seasonality causes the time series to be non-stationary because the mean of the data is not constant. The mean value at a point in time may be different that the mean value at a different point in time within a particualr season/cycle (e.g., sales of frozen turkeys are every year around November and December than the rest of the year).

ACF and Spectral Density

The autocorrelations exhibit sinusoidal ACF behavior converging to zero, which is characteristic of complex conjugate roots and second order factors.

Additionaly, when looking at the spectral density, which helps identify the frequency content of a time series, there is a significant peak near 0 suggesting complex roots and seasonality in the data. In the first spectral density plot there is a large peak near zero. When the truncation point is changed to 20, this peak is at approimately 0.01923, or 1/52, which indicates a period of 52 weeks, or one year.

These observations support the conclusion of non-stationarity and suggest we can take a seasonal difference to stationarize the data.

Overfit the data


Coefficients of Original polynomial:  
0.0000 1.0000 

Factor                 Roots                Abs Recip    System Freq 
1+1.0000B             -1.0000               1.0000       0.5000
1-1.0000B              1.0000               1.0000       0.0000
  
  

The only factor of \((1-B^52)\) that has the behavior we expected is \((1-1.9848B+B^2)\). This suggests that using \(Y_t = (1-1.9848B+B^2)X_t\) as the stationarizing transformation is appropriate.

A non-seasonal difference is taken to see if that transformation yields stationary data. After taking that seasonal difference the realizations still appear to have some pattern and most of the autocorrelations are outside the significance bands.

Seasonality is suspected and makes sense for the data. Factors from overfit factor tables can be compared to the non-stationary factors \((1-B^s)\), where \(s\) is the seasonal term. If non-stationary factors match with the factors of \((1-B^s)\) well, then the data can be transformed by the \((1-B^s)\) factor.

The factor tables suggest there is an annual seasonal component (s=1).

Taking the second difference of the data yields a realization that looks like white noise and fewer lags outside the significance bands suggesting stationarized data.

---------WORKING... PLEASE WAIT... 


Five Smallest Values of  aic 
      p    q        aic
24    3    5   1.206270
23    3    4   1.214357
20    3    1   1.215555
50    8    1   1.216192
34    5    3   1.218166
---------WORKING... PLEASE WAIT... 


Five Smallest Values of  bic 
      p    q        bic
20    3    1   1.261137
5     0    4   1.270700
23    3    4   1.287288
24    3    5   1.288317
17    2    4   1.299023

BIC selects ARMA(3,1) model.

We then estimate the parameters of the differenced data with aic5.wge.


Coefficients of Original polynomial:  
0.4768 0.2190 -0.0985 

Factor                 Roots                Abs Recip    System Freq 
1-0.5490B              1.8214               0.5490       0.0000
1+0.4612B             -2.1683               0.4612       0.5000
1-0.3889B              2.5712               0.3889       0.0000
  
  

ARMA(\(p,\,q\)) models assume that the noise component, \(a_t\), of the model is white noise. If the residuals are not white noise, this suggests that further modeling may be necessary to better explain the behavior in the data. We employ a visual inspection and a more formal test to determine if the residuals are white noise.

Visual Inspection of Residuals

The residuals look random per the realization plot, suggesting white noise.

$autplt
 [1]  1.000000000  0.012364334  0.022644699 -0.002517460 -0.016591296
 [6]  0.136368562 -0.023855196 -0.035591531 -0.022353082 -0.077254235
[11] -0.036478329 -0.123718262 -0.062695833 -0.052554838  0.014033983
[16]  0.040132395 -0.003481309 -0.055316529 -0.005445382 -0.003324064
[21]  0.010787437  0.029855223 -0.014746613  0.084682828 -0.043904654
[26]  0.012874838

$freq
  [1] 0.002217295 0.004434590 0.006651885 0.008869180 0.011086475 0.013303769
  [7] 0.015521064 0.017738359 0.019955654 0.022172949 0.024390244 0.026607539
 [13] 0.028824834 0.031042129 0.033259424 0.035476718 0.037694013 0.039911308
 [19] 0.042128603 0.044345898 0.046563193 0.048780488 0.050997783 0.053215078
 [25] 0.055432373 0.057649667 0.059866962 0.062084257 0.064301552 0.066518847
 [31] 0.068736142 0.070953437 0.073170732 0.075388027 0.077605322 0.079822616
 [37] 0.082039911 0.084257206 0.086474501 0.088691796 0.090909091 0.093126386
 [43] 0.095343681 0.097560976 0.099778271 0.101995565 0.104212860 0.106430155
 [49] 0.108647450 0.110864745 0.113082040 0.115299335 0.117516630 0.119733925
 [55] 0.121951220 0.124168514 0.126385809 0.128603104 0.130820399 0.133037694
 [61] 0.135254989 0.137472284 0.139689579 0.141906874 0.144124169 0.146341463
 [67] 0.148558758 0.150776053 0.152993348 0.155210643 0.157427938 0.159645233
 [73] 0.161862528 0.164079823 0.166297118 0.168514412 0.170731707 0.172949002
 [79] 0.175166297 0.177383592 0.179600887 0.181818182 0.184035477 0.186252772
 [85] 0.188470067 0.190687361 0.192904656 0.195121951 0.197339246 0.199556541
 [91] 0.201773836 0.203991131 0.206208426 0.208425721 0.210643016 0.212860310
 [97] 0.215077605 0.217294900 0.219512195 0.221729490 0.223946785 0.226164080
[103] 0.228381375 0.230598670 0.232815965 0.235033259 0.237250554 0.239467849
[109] 0.241685144 0.243902439 0.246119734 0.248337029 0.250554324 0.252771619
[115] 0.254988914 0.257206208 0.259423503 0.261640798 0.263858093 0.266075388
[121] 0.268292683 0.270509978 0.272727273 0.274944568 0.277161863 0.279379157
[127] 0.281596452 0.283813747 0.286031042 0.288248337 0.290465632 0.292682927
[133] 0.294900222 0.297117517 0.299334812 0.301552106 0.303769401 0.305986696
[139] 0.308203991 0.310421286 0.312638581 0.314855876 0.317073171 0.319290466
[145] 0.321507761 0.323725055 0.325942350 0.328159645 0.330376940 0.332594235
[151] 0.334811530 0.337028825 0.339246120 0.341463415 0.343680710 0.345898004
[157] 0.348115299 0.350332594 0.352549889 0.354767184 0.356984479 0.359201774
[163] 0.361419069 0.363636364 0.365853659 0.368070953 0.370288248 0.372505543
[169] 0.374722838 0.376940133 0.379157428 0.381374723 0.383592018 0.385809313
[175] 0.388026608 0.390243902 0.392461197 0.394678492 0.396895787 0.399113082
[181] 0.401330377 0.403547672 0.405764967 0.407982262 0.410199557 0.412416851
[187] 0.414634146 0.416851441 0.419068736 0.421286031 0.423503326 0.425720621
[193] 0.427937916 0.430155211 0.432372506 0.434589800 0.436807095 0.439024390
[199] 0.441241685 0.443458980 0.445676275 0.447893570 0.450110865 0.452328160
[205] 0.454545455 0.456762749 0.458980044 0.461197339 0.463414634 0.465631929
[211] 0.467849224 0.470066519 0.472283814 0.474501109 0.476718404 0.478935698
[217] 0.481152993 0.483370288 0.485587583 0.487804878 0.490022173 0.492239468
[223] 0.494456763 0.496674058 0.498891353

$db
  [1]   2.63065455  -2.65241936  -6.70016751  -0.57155179  -0.34814824
  [6]  -3.52668810  -8.12534582 -11.01568211  -6.83942551  -4.32221653
 [11]   2.38292220   0.67723763   4.39289268   4.21667036   3.07788119
 [16]   1.29704652  -6.48898073   0.77558262  -2.30901417   6.05538490
 [21]   4.96870487   4.36445550  -7.51349202   4.27321754  -1.08727017
 [26] -28.03519758  -3.66968882   3.28612185   2.66982487  -7.25737891
 [31]  -0.26341085   1.45280999  -0.24308183  -1.34464609  -5.31965376
 [36]  -1.96081949   4.23884276  -6.91032206  -2.96938380 -10.38054263
 [41]  -0.33943613 -36.45438866 -10.62432851 -12.51540540  -5.20595752
 [46]  -1.69632506  -5.11308708   3.56435378 -13.03298220  -3.58851983
 [51] -12.97629185  -6.88653409 -11.47883143  -1.44502690 -11.68822608
 [56]   6.45615251  -6.71341307  -2.16773325  -5.28834876  -1.92658063
 [61] -14.67157387  -7.74728833   7.47637215  -1.56702866   5.98648954
 [66]  -3.75943188   0.57362682 -10.52316048 -13.10429110  -2.55424635
 [71]   0.08097251   5.22548628  -7.80712973  -5.77331613  -2.00792890
 [76]  -8.40345371  -8.64044738  -7.61105525  -2.42739733   5.51468103
 [81]  -2.23262205   2.76776434 -12.93041625   2.22238743  -2.18480996
 [86]  -5.62560338  -2.19045381  -0.66747595   1.71825331   4.56340883
 [91]   6.51653567 -10.85214694   5.35625316  -1.09142991  -4.42831048
 [96]  -8.28325669  -5.18063912  -7.47904988   6.68186203   2.34180105
[101]  -5.73640967  -7.23303446   0.13361678  -1.64024914  -0.41147884
[106]   4.83176988  -3.00230406  -2.45382965   2.07339219  -2.65396693
[111] -11.48764529 -13.11401712  -1.58749065  -5.05299736  -1.72720877
[116]   0.19455676  -1.96796013   5.48105630   4.09364378  -9.70141855
[121]  -7.12515721 -27.53156635  -0.94045656  -7.64523815  -1.25429926
[126]   1.65399481  -6.21441221   0.04721608 -24.63988814  -3.52857820
[131]   0.84705382   1.85084347  -2.14753763  -3.88947421  -6.39087486
[136] -12.09674125  -1.06566735  -5.58206737 -13.47117413 -13.96774202
[141]   4.85392971   2.49302745  -2.13807962  -8.42825004  -3.94536190
[146]  -3.71425511 -15.08592493  -8.49243193  -0.68668717   1.67816435
[151]  -5.21762105   3.21560648   0.49150051   3.32446562  -8.74561118
[156] -15.21931542  -4.42013554  -1.79361764  -2.15239243   1.14831216
[161]   6.21496497  -5.93462855 -39.75614744  -9.12991040 -12.44591252
[166] -16.74408444  -9.69576940   7.51129002  -0.11562729  -7.23637459
[171]   1.05388313  -0.92235036  -5.78224722  -8.71114388  -3.23003612
[176]   2.88088465   7.56335665   1.51406995  -2.85473618   7.47910888
[181]  -3.86947974  -6.81828892  -4.71836733  -0.16607783  -1.64748071
[186]   0.67346988   2.65810413   1.43693964   1.95644872 -15.42745508
[191]  -7.57379808  -1.29669993   1.74238618   5.15129067   4.53556959
[196]  -2.06541207   3.45925927  -1.38656672  -6.56167997 -11.12640355
[201]  -2.26929151  -1.46160339  -6.43093509   2.08933407   0.33812955
[206] -19.42804806  -1.78110250 -15.75263284 -31.56954378  -9.86260020
[211] -13.37199742   0.94158691   5.94643395  -3.26174306   1.78880418
[216]  -5.64106698 -14.15005027  -0.96541101  -8.37093189  -3.63783209
[221]   1.97098357   2.67959742   3.42330407  -6.14772827 -17.37247807

$dbz
  [1] -1.492253045 -1.435021054 -1.337168653 -1.196458327 -1.012172588
  [6] -0.786483913 -0.525106032 -0.237011975  0.066619577  0.373813884
 [11]  0.673128314  0.954618474  1.210361057  1.434592965  1.623583480
 [16]  1.775353771  1.889330007  1.965986072  2.006509326  2.012509314
 [21]  1.985782183  1.928139381  1.841305370  1.726884025  1.586387123
 [26]  1.421311595  1.233246410  1.023986173  0.795627836  0.550629531
 [31]  0.291816929  0.022332566 -0.254463220 -0.535116358 -0.816191653
 [36] -1.094330616 -1.366215278 -1.628416490 -1.877136122 -2.107892217
 [41] -2.315237292 -2.492632564 -2.632610669 -2.727327286 -2.769510911
 [46] -2.753673997 -2.677296503 -2.541623174 -2.351802107 -2.116317542
 [51] -1.845916392 -1.552361958 -1.247322528 -0.941572410 -0.644540375
 [56] -0.364145525 -0.106824488  0.122341381  0.319467482  0.481720001
 [61]  0.607206578  0.694914300  0.744697449  0.757312207  0.734491282
 [66]  0.679046355  0.594978897  0.487569608  0.363404102  0.230280904
 [71]  0.096944329 -0.027400262 -0.133788360 -0.214344289 -0.263125712
 [76] -0.276768164 -0.254756519 -0.199250728 -0.114529416 -0.006220845
 [81]  0.119479122  0.256446472  0.399050817  0.542360220  0.682177438
 [86]  0.814972893  0.937774411  1.048057439  1.143660357  1.222733032
 [91]  1.283715037  1.325333811  1.346611697  1.346873219  1.325748693
 [96]  1.283175818  1.219406077  1.135026344  1.031007044  0.908785618
[101]  0.770387122  0.618572116  0.456985648  0.290261575  0.124017781
[106] -0.035331766 -0.181010158 -0.306571233 -0.406785263 -0.478548139
[111] -0.521577887 -0.538679197 -0.535472512 -0.519650846 -0.499960734
[116] -0.485144092 -0.483027211 -0.499849842 -0.539843318 -0.605019051
[121] -0.695118655 -0.807691610 -0.938290837 -1.080797960 -1.227898146
[126] -1.371711304 -1.504549434 -1.619716789 -1.712221571 -1.779253855
[131] -1.820323520 -1.837036420 -1.832581590 -1.811065180 -1.776836870
[136] -1.733920142 -1.685605354 -1.634218725 -1.581052881 -1.526434243
[141] -1.469901625 -1.410470484 -1.346953271 -1.278297849 -1.203896971
[146] -1.123818720 -1.038916120 -0.950794806 -0.861645904 -0.773978147
[151] -0.690300384 -0.612809048 -0.543126278 -0.482118454 -0.429807206
[156] -0.385369444 -0.347211271 -0.313093479 -0.280283975 -0.245715863
[161] -0.206138967 -0.158266178 -0.098930035 -0.025273242  0.065007222
[166]  0.173362227  0.300069899  0.443954523  0.602222281  0.770478556
[171]  0.942952433  1.112903678  1.273147440  1.416616896  1.536895246
[176]  1.628675173  1.688131965  1.713215011  1.703866279  1.662163444
[181]  1.592362278  1.500784047  1.395470221  1.285527737  1.180135716
[186]  1.087290415  1.012505969  0.957794359  0.921234116  0.897273368
[191]  0.877671413  0.852796325  0.812954172  0.749516204  0.655752860
[196]  0.527399876  0.363037622  0.164362873 -0.063611221 -0.312461542
[201] -0.570486663 -0.823256397 -1.054784575 -1.249409998 -1.394177898
[206] -1.481139934 -1.508787392 -1.482046460 -1.410850720 -1.307884876
[211] -1.186292429 -1.057925912 -0.932342258 -0.816455618 -0.714638830
[216] -0.629069385 -0.560172739 -0.507075933 -0.468027927 -0.440768662
[221] -0.422841438 -0.411848745 -0.405654043 -0.402533443 -0.401282324

The plot of the sample autocorrelations shows lag 6 and lag 9 outside the confidence bands. However, at a 95% confidence level we would expect 1 out of every 20 lags to be outside the bands. Therefore, per a visual inspection, the residuals appear to be white noise.

Ljung-Box Test for residuals

A visual inspection looks at each autocorrelation separately. The Ljung-Box test is used to test the autocorrelations as a group to determine if the residuals are white noise. It tests the null hypothesis (\(H_0\)) that all autocorrelations (\(\rho\)) are zero (i.e., the residuals are white noise). \[ H_0: \rho_1 = \rho_2 = ... = \rho_K = 0\]

If at least one autocorrelation is not zero, then white noise is not present. \[H_a: at\;least\;one\; \rho_k \neq 0, \, for\,\, 1 \leq k \leq K\] In the tswge package, the residuals are found in the output variable $res. These are calculated within the functions est.ar.wge and est.arma.wge.

The Ljung-Box test yeilds \(p > 0.05\), which fails to reject the null hypothesis and suggests the residuals are white noise.

Obs 0.01236433 0.0226447 -0.00251746 -0.0165913 0.1363686 -0.0238552 -0.03559153 -0.02235308 -0.07725423 -0.03647833 -0.1237183 -0.06269583 -0.05255484 0.01403398 0.0401324 -0.003481309 -0.05531653 -0.005445382 -0.003324064 0.01078744 0.02985522 -0.01474661 0.08468283 -0.04390465 
$test
[1] "Ljung-Box test"

$K
[1] 24

$chi.square
[1] 30.86419

$df
[1] 20

$pval
[1] 0.05700908

As a second check, we use a different K-value, which yields \(p > 0.05\) also suggesting the residuals are white noise.

Obs 0.01236433 0.0226447 -0.00251746 -0.0165913 0.1363686 -0.0238552 -0.03559153 -0.02235308 -0.07725423 -0.03647833 -0.1237183 -0.06269583 -0.05255484 0.01403398 0.0401324 -0.003481309 -0.05531653 -0.005445382 -0.003324064 0.01078744 0.02985522 -0.01474661 0.08468283 -0.04390465 0.01287484 0.01242087 0.009543115 0.04261902 -0.03115811 0.06402467 -0.005110867 0.01372502 0.03931109 0.01792378 0.08116198 -0.007698405 -0.004988617 0.1035932 0.03717967 0.03297893 0.03652906 0.05310496 -0.01074421 -0.01902018 0.05832936 0.01690418 -0.0882948 0.01377552 
$test
[1] "Ljung-Box test"

$K
[1] 48

$chi.square
[1] 53.43477

$df
[1] 44

$pval
[1] 0.155851

Per a visual inspection and the Ljung-Box test, the residuals are white noise.

[1] 73.19725
[1] 23.77057

Conclusion

When fitting an ARMA model to a set of data, the goal is to explain as much of the variability in the data as is reasonably possible.

Christian Nava

6/17/2020